Disaster recovery processing method and apparatus and storage unit for the same

ABSTRACT

A technique capable of constructing a disaster recovery system reduced in performance degradation of a primary system is provided. The technique includes a step of conducting synchronous writing of log information into a secondary storage subsystem in a secondary system when a write request received from a host computer is a write request of log information, a step of temporarily storing a write request and conducting asynchronous writing into the secondary storage subsystem when the received write request is a write request of database data or status information, a step of modifying log information, data in a database area, and status information in the secondary storage subsystem according to contents of a write request received from a primary storage subsystem, and a step of recovering the database area according to contents of log information in a location indicated by the status information.

CROSS-REFERENCES

This is a continuation application of U.S. Ser. No. 10/650,842, filedAug. 29, 2003, the entire disclosure of which is hereby incorporated byreference.

BACKGROUND OF THE INVENTION

The present invention relates to a technique for executing processing inanother information processing apparatus, or a program or an object thatconducts its processing, in response to occurrence of a failure or apredetermined condition, or a request.

For conventional database management systems (or computer systems orinformation processing systems), there is a technique of placing aplurality of replications in a plurality of geographically distributedsites (computers or information processing apparatuses) by way ofprecaution against failures, i.e., the so-called disaster recoverytechnique. In the so-called disaster recovery technique, data of acertain site are stored in other geographically separated sites asreplications. In the case where a failure is caused by a disaster or thelike in a certain site, business is recovered in another site.

For database management systems (or computer systems or informationprocessing systems, where database management systems are taken as anexample), there are several methods as a technique of having suchreplications. Basically, a request is sent to a system that becomesprincipal when seen from clients, i.e., a primary system, and aninformation record called log is generated in the primary system, andused to recover the processing and as backup. In other words, this logrecord is sent from the primary system to a system called secondarysystem, and a host computer of the secondary system conducts the samemodification processing as the primary system by referring to the logrecord and thereby modifies the state of the secondary system. Such atechnique of implementing the replication by sending a log recordgenerated in the primary system to the secondary system is disclosed inU.S. Pat. No. 5,640,561.

In a remote copy function (see U.S. Pat. No. 5,640,561) in a storageapparatus used when sending a log record or database data generated in aprimary system to a secondary system, conventional data transfer methodsare mainly divided broadly into the following two kinds.

(1) Synchronous Method

Upon a data write request from a host computer in a certain site (hereinreferred to as main site), a storage apparatus in the main sitetransfers pertinent data to a storage apparatus in another site (hereinreferred to as remote site). After arrival of a receipt report ofpertinent data from the storage apparatus in the remote site, thestorage apparatus in the main site reports writing completion to a hostcomputer in the main site.

There is a merit that it is assured that data have arrived at the remotesite when writing has been completed in the main site. On the otherhand, there is a drawback that an increase in distance between sites orline delay increases the write response time in the main site and causesperformance degradation.

(2) Asynchronous Method

When a data write request from a host computer in a main site hasarrived, a storage apparatus in the main site reports writing completionto the host computer in the main site without waiting for completion ofpertinent data transfer to a remote site.

As compared with the synchronous method, the possibility of performancedegradation in the main site is reduced. In the case where a failure hasoccurred in the main site, there is a possibility that recent data arelost in the remote site and transactions are lost.

There are methods in which it is assured that the sequentiality of datawriting in the main site coincides with that in the remote site asdisclosed in U.S. Pat. No. 5,640,561, and methods in which it is notassured. For avoiding that the state in the middle of a transactionremains and consistency of a database cannot be assured, it is necessaryto assure the sequentiality of data writing. The sequentiality assurancecan be configured so as to be effective for a set of a plurality ofdisks. A technique for assuring the sequentiality for a set of a diskfor log (journal) and a disk for DB is disclosed in U.S. Pat. No.5,640,561.

In general, a database management system (DBMS) has a DB disk forstoring data itself and a log disk for storing DB modification historyinformation in a time series form. If a server in the main site (whichis a computer or an information processing apparatus in the main site)is shut down, data on the DB disk assumes an incomplete modificationstate in some cases. At the time of restart of the DBMS, however, aconsistent state is recovered on the basis of the DB modificationhistory information on the log disk. (Such a technique is disclosed inJim Gray and Andreas Reuter, “TRANSACTION PROCESSING: Concepts andTechniques,” published by Morgan Kaufmann Publishers, which is hereafterreferred to as reference paper 1) In other words, at the time of servershut down, modification data of completed transactions are forced to theDB disk (rollforward), and modification data of transactions that havebeen incomplete at the time of server down are invalidated (rollback).

As a transfer method in the disaster recovery system, the followingmethods are known.

(a) Log Synchronous and DB Synchronous Method

The log and DB are transferred synchronously to the remote site. Thesame states of the log and DB as those in the main site are alwayspresent in the remote site. When a failure has occurred, recoveryprocessing in the same situation as the restart in the main site can beimplemented. In other words, modification contents of transactions thathave been completed in the main site are not lost in the remote site.Since the log and DB are transferred synchronously, however, theperformance in the main site is degraded as compared with the case wheresuch a configuration is not adopted.

(b) Log Asynchronous and DB Asynchronous Method

The log and DB are transferred asynchronously to the remote site. Inaddition, the modification sequentiality in the remote site isguaranteed. Since the modification in the remote side is delayed, statesof the log and DB assumed in the main site the delay time before arepresent in the remote site. By making the modification sequentiality ofthe log and DB assured, the consistent DB state assumed in the main sitethe delay time before can be recovered. Although the performancedegradation in the main site is slight, modification contents oftransactions that have been completed in the main site are sometimeslost in the remote site.

SUMMARY OF THE INVENTION

If in the conventional database management system (or computer system orinformation processing system, where a database management system istaken as an example) the log and DB (where a database is taken as anexample, but data stored to be used for processing may also be used) aretransferred to the remote site in synchronism with the main site, thepossibility that modification contents of transactions that have beencompleted in the main site are lost in the remote site is low. Since thelog and DB are transferred synchronously, there is a problem that theperformance in the main site is degraded as compared with the case wheresuch a configuration is not adopted.

If in the conventional database management system the log and DB aretransferred to the remote site asynchronously, then the performancedegradation in the main site is slight, but there is a problem thatmodification contents of transactions that have been completed in themain site are sometimes lost in the remote site.

A first object of the present invention is to provide a technique inwhich the possibility that modification contents of transactions thathave been completed in a main site are lost in a remote site is low,when executing processing in another information processing apparatus,or a program or an object that conducts its processing, in response tooccurrence of a failure or a predetermined condition, or a request.

A second object of the present invention is to provide a technique inwhich the possibility that modification contents of transactions thathave been completed in a primary system are lost in a secondary systemis low.

A third object of the present invention is to provide a technique inwhich the performance degradation in the primary system can be reduced.

In a first program for storing processing data to be subject toprocessing using the first program and log information for recoveringthe processing in first storage means, and a second program for storingprocessing data to be subject to processing and log information forrecovering the processing in second storage means, an informationprocessing recovery method for recovering the processing using the firstprogram, by executing the processing using the second program when afailure has occurred in the processing using the first program is firstmeans. In the information processing recovery method, the followingprocessing is executed.

In response to an input write request of the log information, theprocessing data, and status information indicating a storage location ofthe log information, the log information, the processing data, and thestatus information are stored in the first storage means. When the loginformation has been stored in the second storage means, a response tothe write request is sent to the second storage means. When apredetermined condition is satisfied, the processing data and the statusinformation are stored in the second storage means.

In a system in which switching to a second database processing system isconducted when a failure has occurred in a first database processingsystem and database processing is continued, second means executes thefollowing processing. In response to a write request to the seconddatabase processing system, log information is modified by synchronouswriting, and database data and status information are modified byasynchronous writing.

In a disaster recovery system in which switching to a secondary databaseprocessing system is conducted when a failure has occurred in a primarydatabase processing system and database processing is continued, thirdmeans modifies log information by synchronous writing and modifiesdatabase data and status information by asynchronous writing, at thetime of write request to the secondary system.

In a disaster recovery system according to the present invention, a hostcomputer includes a database buffer for temporarily holding contents ofa database area in a storage subsystem, and a log buffer for temporarilyholding contents of modification processing for the database buffer.Contents of the database buffer are modified with the advance ofexecution of database processing in the host computer. When it hasbecome necessary to force the modification contents to the database areain the storage subsystem, a write request of log information indicatingcontents of modification processing conducted on the database buffer,database data modified in the database buffer, or status informationindicating a location of log information at the time of checkpoint istransmitted from the primary host computer in the primary system to theprimary storage subsystem in the primary system.

The primary storage subsystem receives the write request from the hostcomputer. According to contents of the received write request,modification of log information, data in the database area, and statusinformation in the primary storage subsystem is conducted. The primarystorage subsystem is previously configured so that a log informationdisk may be subject to synchronous remote copy and a database data diskand a status information disk may be subject to asynchronous remote copythat guarantees a modification sequentiality over both disks.

According to this configuration, the primary storage subsystem writes awrite request for a log information disk into a secondary storage devicein the secondary system by using a synchronous method, and writes awrite request for a database area data disk and status information diskinto the secondary storage device in the secondary system by using anasynchronous method.

The secondary storage subsystem receives a write request of the loginformation, database data, or status information from the primarystorage subsystem. According to contents of the received write request,log information, database area data and status information in thesecondary storage subsystem are modified (see U.S. Pat. No. 5,640,561).

If thereafter a failure occurs in the primary database processing systemand database processing is started in the secondary database processingsystem, then log information is read out from a location indicated bythe status information, and data in the database area in the secondarystorage subsystem is modified according to contents of the loginformation thus read out. As a result, the database area in thesecondary storage subsystem is restored to the consistent state of thedatabase area immediately before the failure occurrence.

In business processing having a high DB modification ratio, the I/O loadon the DB disk becomes high as compared with the log disk. On the otherhand, transactions to be recovered in the secondary system depend on theinformation on the log disk. Therefore, it becomes possible to preventmodification contents of transactions completed in the primary systemfrom being lost by conducting synchronous copy on the information on thelog disk, and it becomes possible to construct a disaster recoverysystem reduced in performance degradation in the primary system byconducting asynchronous copy on the information on the DB disk.

According to the disaster recovery system of the present embodiment, loginformation is modified by synchronous writing and database data andstatus information are modified by asynchronous writing, when writing tothe secondary system is requested, as heretofore described. Therefore,the contents of modification in transactions completed in the primarysystem are prevented from being lost in the secondary system. It ispossible to construct a disaster recovery system reduced in performancedegradation in the primary system.

Other objects, features and advantages of the invention will becomeapparent from the following description of the embodiments of theinvention taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing a system configuration of a disasterrecovery system in the present embodiment;

FIG. 2 is a diagram showing an outline of synchronous remote copyprocessing in a log block 262 a in the present embodiment;

FIG. 3 is a diagram showing an outline of asynchronous remote copyprocessing of DB blocks and status information in the presentembodiment;

FIG. 4 is a diagram showing configuration information of a DB-diskmapping table 15;

FIG. 5 is a diagram showing an example of a primary/secondary remotecopy management table;

FIG. 6 is a flow chart showing a processing procedure of checkpointacquisition processing in the present embodiment;

FIG. 7 is a flow chart showing a processing procedure conducted uponreceiving a write command in the present embodiment;

FIG. 8 is a flow chart showing a processing procedure conducted uponreceiving a read command in the present embodiment;

FIG. 9 is a flow chart showing a processing procedure of data receptionprocessing in a secondary disk subsystem 4 in the present embodiment;

FIG. 10 is a flow chart showing a processing procedure of DBMS startprocessing in the present embodiment; and

FIG. 11 is a diagram showing an outline of synchronous remote copyprocessing conducted at the time of checkpoint in the presentembodiment.

DESCRIPTION OF THE EMBODIMENTS

Hereafter, a system of an embodiment in which upon a write request to asecondary system log information is updated by synchronous writing anddatabase data and status information are modified by asynchronouswriting will be described.

FIG. 1 is a diagram showing a system configuration of the presentembodiment. As shown in FIG. 1, a primary host computer 1 (which may beimplemented by using a computer, an information processing apparatus, ora program or an object capable of conducting the processing) includes aDB access control section 111 (hardware, a program, or an object capableof conducting the processing), a checkpoint processing section 112(hardware, a program, or an object capable of conducting theprocessing), a log management section 113 (hardware, a program, or anobject capable of conducting the processing), and a DB delay writeprocessing section 114 (hardware, a program, or an object capable ofconducting the processing).

The DB access control section 111 is a processing section forcontrolling access to a primary DB disk 24 (storage means) and a primarylog disk 26 (storage means) via a DB buffer 12 (storage means) and a logbuffer 14 (storage means). The checkpoint processing section 112 is aprocessing section for transmitting a write request of all DB blocksmodified in the DB buffer 12, and status information indicating a logdisk of a latest log record at that time point and its location, fromthe primary host computer 1 to a primary disk subsystem 2, when it hasbecome necessary to force contents in the DB buffer 12 in the primaryhost computer 1 to a storage device in the primary disk subsystem 2,which is a disk subsystem in a primary system.

As disclosed in the reference paper 1, some transactions are notcomplete at the time of a checkpoint. Besides the location of the latestlog record, therefore, the status information indicates in some casesthe locations of the oldest log records relating to the uncompletedtransactions. Modification of the status information on a disk isdelayed in some cases. In either case, the status information may beused as information indicating the location of the log where referenceis to be started when the database management system restarts.

The log management section 113 is a processing section for transmittinga write request of a log block 262 a, which is log informationindicating contents of database processing that has been conducted onthe DB buffer 12, from the primary host computer 1 to a primary disksubsystem 2. The DB delay write processing section 114 is a processingsection for transmitting a write request of database data on the DBbuffer 12 from the primary host computer 1 to the primary disk subsystem2.

A program for making the primary host computer 1 function as the DBaccess control section 111, the checkpoint processing section 112, thelog management section 113 and the DB delay write processing section 114is recorded on a recording medium such as a CD-ROM, stored on a magneticdisk or the like, and thereafter loaded into a memory, and executed. Therecording medium for recording the program thereon may also be anotherrecording medium other than the CD-ROM. The program may be installed inthe information processing apparatus from the recording medium and used,or the recording medium may be accessed through a network to use theprogram.

The primary disk subsystem 2 (which may be implemented by using astorage unit, a disk system, a computer, an information processingapparatus, or a program or an object capable of conducting theprocessing) includes a disk control processing section 21 (hardware, aprogram, or an object capable of conducting the processing), a commandprocessing section 211 (hardware, a program, or an object capable ofconducting the processing), a primary remote copy processing section 212(hardware, a program, or an object capable of conducting theprocessing), and a disk access control section 23 (hardware, a program,or an object capable of conducting the processing).

The disk control processing section 21 is a control processing sectionfor controlling operation of the whole primary disk subsystem apparatus.The command processing section 211 is a processing section for receivinga write request of a DB block 242 a, the status information or a logblock 262 a from the primary host computer 1, and modifying contents ofthe primary DB disk 24, a primary status disk 25, the primary log disk26, or a cache memory 22 (storage means) for storing their contents,included in the primary disk subsystem, according to contents of thereceived write request.

The primary remote copy processing section 212 is a processing sectionfor referring to the primary remote copy management table and conductingsynchronous or asynchronous remote copying according to configurationinformation in the primary remote copy management table. If in the caseof the present embodiment the received write request is a write requestof the log block 262 a, then the primary remote copy processing section212 conducts synchronous write processing of the log block 262 into asecondary disk subsystem 4, which is a disk subsystem of a secondarysystem (which may be implemented by using a computer, an informationprocessing apparatus, or a program or an object capable of conductingthe processing). If the received write request is a write request of thelog block 242 a or status information, then the primary remote copyprocessing section 212 temporarily stores the write request and conductsasynchronous write processing into the secondary disk subsystem 4. Thedisk access control section 23 is a processing section for controllingaccess to respective magnetic disk devices placed under the primary disksubsystem 2.

A program for making the primary disk subsystem 2 function as the diskcontrol processing section 21, the command processing section 211, theprimary remote copy processing section 212 and the disk access controlsection 23 is recorded on a recording medium such as a floppy disk, andexecuted. The recording medium for recording the program thereon mayalso be another recording medium other than the floppy disk. The programmay be installed in the information processing apparatus from therecording medium and used, or the recording medium may be accessedthrough a network to use the program.

A secondary host computer 3 (which may be implemented by using acomputer, an information processing apparatus, or a program or an objectcapable of conducting the processing) includes a DB access controlsection 311 (hardware, a program, or an object capable of conducting theprocessing), a checkpoint processing section 312 (hardware, a program,or an object capable of conducting the processing), a log managementsection 313 (hardware, a program, or an object capable of conducting theprocessing), and a DB delay write processing section 314 (hardware, aprogram, or an object capable of conducting the processing).

The DB access control section 311 is a processing section for conductingprocessing similar to that of the DB access control section 111 in theprimary system, at the time of operation of the secondary system. Thecheckpoint processing section 312 is a processing section for conductingprocessing similar to that of the checkpoint processing section 112 inthe primary system, at the time of operation of the secondary system.

The log management section 313 is a processing section for conductingprocessing similar to that of the log management section 113 in theprimary system, at the time of operation of the secondary system. The DBdelay write processing section 314 is a processing section forconducting processing similar to that of the DB delay write processingsection 114 in the primary system, at the time of operation of thesecondary system.

A program for making the secondary host computer 3 function as the DBaccess control section 311, the checkpoint processing section 312, thelog management section 313 and the DB delay write processing section 314is recorded on a recording medium such as a CD-ROM, stored on a magneticdisk or the like, and thereafter loaded into a memory, and executed. Therecording medium for recording the program thereon may also be anotherrecording medium other than the CD-ROM. The program may be installed inthe information processing apparatus from the recording medium and used,or the recording medium may be accessed through a network to use theprogram.

A secondary disk subsystem 4 (which may be implemented by using astorage unit, a disk system, a computer, an information processingapparatus, or a program or an object capable of conducting theprocessing) includes a disk control processing section 41 (hardware, aprogram, or an object capable of conducting the processing), a commandprocessing section 411 (hardware, a program, or an object capable ofconducting the processing), a secondary remote copy processing section412 (hardware, a program, or an object capable of conducting theprocessing), and a disk access control section 43 (hardware, a program,or an object capable of conducting the processing).

The disk control processing section 41 is a control processing sectionfor controlling operation of the whole secondary disk subsystemapparatus. When switching from the primary system to the secondarysystem is conducted and database processing in the database processingsystem in the secondary system is started, the command processingsection 411 reads out a log record from a location of a log block 462 aindicated by status information in a secondary status disk 45 and sendsout the log record to the secondary host computer 3, in accordance withan order issued by the secondary host computer 3. By modifying data on asecondary DB disk 44 in the secondary disk subsystem 4 according tocontents of the pertinent log record analyzed by the secondary hostcomputer in accordance with an order issued by the secondary hostcomputer 3, the disk control processing section 41 conducts processingof restoring the state of the secondary DB disk 44 in the secondary disksubsystem 4 to the state of the primary DB disk 24 immediately beforethe switching to the secondary system. The disk control processingsection 41 conducts modification on the secondary DB disk 44, thesecondary status disk 45 and a secondary log disk 46 in keeping withdatabase processing after the switching.

The secondary remote copy processing section 412 is a processing sectionfor receiving a write request of the DB block 242 a, the statusinformation or the log block 262 a from the primary disk subsystem 2,and conducting modification on the secondary DB disk 44, the secondarystatus disk 45 and the secondary log disk 46 in the secondary disksubsystem 4, or on a cache memory 42 storing their contents. The diskaccess control section 43 is a processing section for controlling accessto respective magnetic devices placed under the secondary disk subsystem2. In the case of the asynchronous remote copy, modification on thepertinent cache memory 42 or disk is conducted after confirmation of thesequentiality, as described in JP-A-11-85408 entitled “Storage controlapparatus.”

A program for making the secondary disk subsystem 4 function as the diskcontrol processing section 41, the command processing section 411, thesecondary remote copy processing section 412 and the disk access controlsection 43 is recorded on a recording medium such as a floppy disk, andexecuted. The recording medium for recording the program thereon mayalso be another recording medium other than the floppy disk. The programmay be installed in the information processing apparatus from therecording medium and used, or the recording medium may be accessedthrough a network to use the program.

In the disaster recovery system of the present embodiment, the primarydisk subsystem 2 for the primary host computer 1 serving as the primarysystem and the secondary disk subsystem 4 for the subsidiary hostcomputer 3 serving as the secondary system may be connected to eachother via a fiber channel, a network such as Ethernet, Gigabit Ethernetor SONET, or a link. The connection means may be a virtual network, orany data communication means using radio, broadcast communication orsatellite communication.

In the primary host computer 1, the DB access control section 111 of theprimary system operates. The primary host computer 1 includes the DBbuffer 12 for temporarily holding contents of the primary DB disk 24 inthe primary disk subsystem 2, and the log buffer 14 for temporarilyholding contents of modification processing conducted on the DB buffer12. Each of the DB buffer 12 and the log buffer 14 may also be avolatile memory, which typically loses data at the time of a powerfailure.

In the primary disk subsystem 2, the primary DB disk 24 on a magneticdisk device is accessed through the disk control processing section 21,the cache memory 22 and the disk access control section 23, whichreceive an instruction from the primary host computer and operate. Diskaccess is conducted always via the cache memory 22. The cache memory 22may also be a volatile memory, which typically loses data at the time ofa power failure. In this case, at the time when data is stored in thecache memory 22, the data is guaranteed.

If access to the primary DB disk 24 is requested by a transaction, thenthe DB access control section 111 in the primary host computer 1 of thepresent embodiment acquires the DB block 242 a from the primary disksubsystem 2 by using a read command, stores the DB block 242 a in the DBbuffer 12, conducts database processing on the DB block 242 a in the DBbuffer 12, and then stores log information indicating contents of theprocessing in the log block 262 a in the log buffer 14.

If it has become necessary to force the contents of the DB buffer 12 inthe primary host computer 1 to a storage device in the primary disksubsystem 2 serving as a disk subsystem in the primary system, such aswhen log records indicating that records on the DB buffer 12 has beenmodified arrives at a predetermined number, then the checkpointprocessing section 112 generates a write command for writing a DB blockor status information, as a write request for all DB blocks modified inthe DB buffer 12 and status information indicating a location of a logrecord that is the latest at that time point, and transmits the writecommand from the primary host computer 1 to the primary disk subsystem2.

If at the time of transaction committing, a predetermined condition,such as elapse of a predetermined time since start of log informationrecording or disappearance of an empty place in the log buffer 14, isarrived at, then the log management section 113 generates a writecommand for writing the log block 262 a, as a write request of the logblock 262 a stored in the log buffer 14 into the primary log disk 26,and transmits the write command from the primary host computer 1 to theprimary disk subsystem 2.

If a predetermined condition, such as elapse of a predetermined timesince start of database processing or disappearance of an empty place inthe DB buffer 12, is arrived at, then the DB delay write processingsection 114 generates a write command for writing the DB block 242 a, asa write request of the DB block 242 a stored in the DB buffer 12 intothe primary DB disk 24, and transmits the write command from the primaryhost computer 1 to the primary disk subsystem 2.

With respect to a write request of the log block 262 a included in writerequests transmitted from the primary host computer 1 as describedabove, the primary disk subsystem 2 of the present embodiment conductssynchronous remote copy processing to the secondary disk subsystem 4 insynchronism with writing performed in the primary disk subsystem 2. Withrespect to writing of a DB block and status information, the primarydisk subsystem 2 of the present embodiment conducts asynchronous remotecopy processing, which is not synchronized to writing in the primarydisk subsystem 2, to the secondary disk subsystem 4.

FIG. 2 is a diagram showing an outline of synchronous remote copyprocessing of the log block 262 a in the present embodiment. If aprimary log write request for requesting to write the log block 262 a istransmitted from the primary host computer 1 as shown in FIG. 2, thenthe primary disk subsystem 2 writes the log block 262 a transmittedtogether with the write request into the cache 22, transmits the logblock 262 a to the secondary disk subsystem 4, requests remote copy ofthe log block 262 a in the secondary disk subsystem 4, and waits forcompletion of the remote copy.

If a command for requesting to write the log block 262 a is transmittedfrom the primary disk subsystem 2, then the secondary disk subsystem 4writes the log block 262 a transmitted together with the write requestinto the cache 22, and thereafter generates a remote copy completionnotice indicating that the writing has been completed, and transmits theremote copy completion notice to the primary disk subsystem 2.

Upon receiving the remote copy completion notice from the secondary disksubsystem 4, the primary disk subsystem 2 generates a primary log writecompletion notice indicating that the writing the log block 262 arequested by the primary host computer 1 has been completed, andtransmits the primary log write completion notice to the primary hostcomputer 1.

FIG. 3 is a diagram showing an outline of asynchronous remote copyprocessing of a DB block and status information in the presentembodiment. If a primary DB write request for requesting to write the DBblock and the status information is transmitted from the primary hostcomputer 1 as shown in FIG. 3, then the primary disk subsystem 2 writesthe DB block and the status information transmitted together with thewrite request into the cache 22, thereafter temporarily stores the DBblock and the status information in a queue in a memory or a magneticdisk in the primary disk subsystem 2, generates a primary DB writecompletion notice indicating that writing the DB block 242 a requestedby the primary host computer 1 has been completed, and transmits theprimary DB write completion notice to the primary host computer 1.

Thereafter, the primary disk subsystem 2 transmits the stored DB blockor status information to the secondary disk subsystem 4, requests remotecopy of the DB block and status information in the secondary disksubsystem 4, and waits for completion of the remote copy.

If a remote copy request for requesting to write the DB block or statusinformation is transmitted from the primary disk subsystem 2, then thesecondary disk subsystem 4 receives the DB block or status informationtransmitted together with the remote copy request, thereafter generatesa remote copy completion notice indicating that the request has beencompleted, and transmits the remote copy completion notice to theprimary disk subsystem 2.

FIG. 4 is a diagram showing configuration information of a DB-diskmapping table 15 in the present embodiment. As shown in FIG. 4, theDB-disk mapping table 15 stores a database area ID, a file ID, and akind. The database area ID is information for identifying a databasearea in the primary DB disk 24. The file ID indicates a sequentialnumber of a file in the case where the database area identified by thedatabase area ID includes a plurality of files. The kind indicates whichof database data, log information and status information is data storedin the database area.

With respect to a disk control device number for identifying a diskcontrol device to which the database area is mapped, and a physicaldevice ID of a magnetic disk device included in magnetic disk devicescontrolled by a disk control device having the disk control devicenumber to which the database area is mapped, IDs of the primary disksubsystem 2 and the secondary disk subsystem 4 are stored.

A DB-disk mapping table 35 in the secondary disk subsystem 4 also has aconfiguration similar to that of the DB-disk mapping table 15 in theprimary disk subsystem 2.

FIG. 5 is a diagram showing an example of a primary/secondary remotecopy management table in the present embodiment. As shown in FIG. 5, acopy mode indicating whether the write processing is conductedsynchronously or asynchronously is stored in a primary remote copymanagement table 213 and a secondary remote copy management table 413.With respect to a disk control device number of a disk control device inwhich write processing is conducted with that copy mode, and a physicaldevice ID of a magnetic disk device, IDs in the primary disk subsystem 2and the secondary disk subsystem 4 are stored.

On the basis of information in the DB-disk mapping table 15 shown inFIG. 4 and information in the primary remote copy management table 213shown in FIG. 5, it can be determined whether each of the log block, DBblock and status information is written into the secondary disksubsystem synchronously or asynchronously. For example, on the basis ofFIG. 4, a log block in a database area ID “LOG1” is written into amagnetic disk device having a primary disk control device ID “CTL#A1”and a primary physical device ID “VOL12-A.” With reference to FIG. 5,the copy mode for the magnetic disk device having the primary diskcontrol device ID “CTL#A1” and the primary physical device ID “VOL12-A”is “synchronous.” Therefore, the log block in the database area ID“LOG1” is written into the secondary disk subsystem 4 by the synchronousremote copy processing.

On the other hand, the system serving as the secondary system also has asimilar configuration. The primary disk subsystem 2 and the secondarydisk subsystem 4 are connected to each other via the network. In thestandby state, the secondary host computer 3 is not in operation. Thesecondary disk subsystem 4 receives the log block, DB block and statusinformation from the primary disk subsystem 2 via the network, andmodifies disks respectively corresponding to them.

When acquiring a checkpoint, the checkpoint processing section 112 inthe primary host computer 1 of the present embodiment stores all DBblocks modified on the DB buffer 12 in the primary DB disk 24, andstores status information indicating the location of the log record atthat time in the primary status disk 25. Hereafter, this checkpointacquisition processing will be described.

FIG. 6 is a flow chart showing a processing procedure of the checkpointacquisition processing in the present embodiment. When it has becomenecessary to force the contents of the DB buffer 12 in the primary hostcomputer 1 to a storage device in the primary disk subsystem 2 servingas a disk subsystem in the primary system, the checkpoint processingsection 112 in the primary host computer 1 conducts processing oftransmitting a write request for all DB blocks modified in the DB buffer12 and the status information indicating the location of the log recordthat is the latest at that time point, from the primary host computer 1to the primary disk subsystem 2 as shown in FIG. 6.

At step 701, the checkpoint processing section 112 generates acheckpoint acquisition start log, which indicates that the checkpointacquisition has been started, and stores the checkpoint acquisitionstart log in the log block 262 a.

At step 702, the checkpoint processing section 112 generates a writecommand for writing all DB blocks modified on the DB buffer 12 into theprimary disk subsystem 2, transmits the write command to the primarydisk subsystem 2 to request the primary disk subsystem 2 to write the DBblocks. The primary disk subsystem 2 receives the write command, writesthe DB blocks into the cache memory 22, and forces contents ofmodification conducted in the DB buffer 12 to the cache memory 22.

Step 703 will be described at the end of the description of the presentembodiment.

At step 704, a checkpoint acquisition end log, which indicates that thecheckpoint acquisition has been finished, is generated and stored in thelog block 262 a.

At step 705, a write command for writing an LSN (Log Sequence Number) ofthe checkpoint acquisition end log into the primary disk subsystem 2 asstatus information is generated, and the write command is transmitted tothe primary disk subsystem 2 to request the primary disk subsystem 2 towrite the status information. Upon receiving the write command in theprimary disk subsystem 2, the status information is written into theprimary status disk 25.

In the case where database processing in the primary database processingsystem is terminated abnormally because of a failure or the like andthereafter the processing in the primary database processing system isresumed, the state of the database that has been completed untilimmediately before the termination can be recovered by reading out a logrecord from a location indicated by status information in the primarystatus disk 25 and modifying data in the primary DB disk 24 according tocontents of the log record.

Supposing that in the disaster recovery system of the present embodimentthe primary host computer 1 has requested the primary disk subsystem 2to write the log block, DB block or status information, processingconducted in the primary disk subsystem 2 will now be described.

FIG. 7 is a flow chart showing a processing procedure taken in thepresent embodiment when a write command has been received. Uponreceiving a command from the primary host computer 1 as shown in FIG. 7,the command processing section 211 in the primary disk subsystemanalyzes the received command to find a command kind and an address tobe accessed, and recognizes that the command is a write command (step341). It is now supposed that a device ID requested to be accessed canbe acquired from the address to be accessed by comparing the address tobe accessed with information in a device configuration management table,which indicates addresses assigned to a plurality of disk subsystems andtheir magnetic disk devices.

Subsequently, it is determined whether data of the address to beaccessed found at the step 341 is held in the cache memory 22 in theprimary disk subsystem 2, and a cache hit miss decision is conducted(step 342).

In the case of a cache miss in which the data to be accessed is not heldin the cache memory 22, a transfer destination cache area is secured.The cache address of the transfer destination is managed and acquired byusing a typical method such as a cache vacancy list.

If a cache hit is judged at the step 342 to hold true, or insurance of acache area is finished at step 344, then modification of the data isconducted on the cache memory 22 in the primary disk subsystem 2 (step345). In other words, contents of the DB block 242 a, the statusinformation, or the log block 262 a received from the primary hostcomputer 1 are written into the cache memory 22.

At step 346, the primary remote copy management table 213 is referredto, and a copy mode corresponding to the primary disk control device IDand the primary physical device ID indicated by the address to beaccessed is read out to make a decision whether the copy mode is“synchronous.”

If the copy mode is “synchronous” as a result of the decision, i.e., thereceived write request is a write request for the log block 262 a, thenthe processing proceeds to step 347. At the step 347, completion of thesynchronous remote copy is waited for, and thereby synchronous remotecopy processing of the log block 262 a is conducted.

If the copy mode is “asynchronous,” i.e., the received write request isa write request of the DB block 242 a or the status information, thenthe processing proceeds to step 348. At the step 348, the received datais temporarily stored in a queue in a memory or a magnetic disk in theprimary disk subsystem 2 in order to prepare for asynchronous remotecopy processing to be conducted thereafter on the secondary disksubsystem 4.

At step 349, completion of the write command processing is reported tothe primary host computer 1.

The primary disk subsystem 2 transmits the stored data to the secondarydisk subsystem 4, and executes asynchronous remote copy processing ofthe DB block or status information to the secondary disk subsystem 4.

FIG. 8 is a flow chart showing a processing procedure taken in thepresent embodiment when a read command has been received. Upon receivinga command from the primary host computer 1 as shown in FIG. 8, thecommand processing section 211 analyzes the received command to find acommand kind and an address to be accessed, and recognizes that thecommand is a read access request (step 361). It is now supposed that adevice ID requested to be accessed can be acquired from the address tobe accessed.

Subsequently, it is determined whether data of the address to beaccessed found at the step 361 is held in the cache memory 22 in theprimary disk subsystem 2, and a cache hit miss decision is conducted(step 362).

In the case of a cache miss in which the data to be accessed is not heldin the cache memory 22, a device ID requested to be accessed isdiscriminated as described above, and the disk access control section 23in the primary disk subsystem 2 is requested to transfer from a magneticdisk device corresponding to the device ID to the cache memory 22 (step363). In this case, the read processing is interrupted until the end oftransfer (step 364), and the read processing is continued again afterthe end of the transfer processing. The cache address of the transferdestination may be managed and acquired by using a typical method suchas a cache vacancy list. As for the address of the transfer destination,however, it is necessary to modify a cache management table and therebyconduct registration.

If a cache hit is judged at the step 362 to hold true, or the transferprocessing is finished at step 364, then data in the cache memory in thedisk subsystem is transferred to a channel (step 365).

Supposing that in the disaster recovery system of the present embodimentthe primary disk subsystem 2 has requested the secondary disk subsystem4 to write the log block synchronously or write the DB block or statusinformation asynchronously, processing conducted in the secondary disksubsystem 4 will now be described.

FIG. 9 is a flow chart showing a processing procedure of data receptionprocessing conducted by the secondary disk subsystem 4 in the presentembodiment. Upon receiving a command from the primary host computer 1 asshown in FIG. 9, the secondary remote copy processing section 412 in thesecondary disk subsystem 4 analyzes the received command to find acommand kind and an address to be accessed, and recognizes that thecommand is a remote copy command (step 421). It is now supposed that adevice ID requested to be accessed can be discriminated from the addressto be accessed.

Subsequently, it is determined whether data of the address to beaccessed found at the step 421 is held in the cache memory 42 in thesecondary disk subsystem 4, and a cache hit miss decision is conducted(step 422).

In the case of a cache miss in which the data to be accessed is not heldin the cache memory 42, a transfer destination cache area is secured.The cache address of the transfer destination may be managed andacquired by using a typical method such as a cache vacancy list. As forthe address of the transfer destination, however, it is necessary tomodify a cache management table and thereby conduct registration.

If a cache hit is judged at the step 422 to hold true, or insurance of acache area is finished at step 424, then modification of the data isconducted on the cache memory 42 in the secondary disk subsystem 4 (step425). In other words, contents of the DB block 242 a, the statusinformation, or the log block 262 a received from the primary disksubsystem 2 are written into the cache memory 42. The case of thesynchronous remote copy has heretofore been described. In the case whereasynchronous remote copy is used and the sequentiality as described inJP-A-11-85408 entitled “storage control apparatus” is guaranteed, it isnecessary before modification on the cache to ascertain that all datathat should arrive by then are ready.

At step 426, completion of the report copy command processing isreported to the primary disk subsystem 2.

As for the log block write request, synchronous remote copy processingin the secondary disk subsystem 4 synchronized with the writing in theprimary disk subsystem 2 is conducted, in the disaster recovery systemof the present embodiment as described above. Therefore, it is possibleto prevent that contents of transaction modification that has beencompleted in the primary system are lost in the secondary system. As forthe DB block and status information writing, asynchronous remote copyprocessing in the secondary disk subsystem 4, which is not synchronizedwith the writing in the primary disk subsystem 2, is conducted.Therefore, performance degradation in the primary system can beprevented as far as possible.

If writing in the secondary disk subsystem 4 is conducted as describedabove and thereafter a failure occurs in the primary database processingsystem and database processing is started in the secondary databaseprocessing system, then in DBMS start processing log information is readout from a location indicated by status information in the secondarystatus disk 45, and the state of the database area in the primary systemimmediately before the occurrence of the failure is recovered on thesecondary DB disk 44 in the secondary disk subsystem 4.

FIG. 10 is a flow chart showing a processing procedure of the DBMS startprocessing in the present embodiment. If switching from the primarysystem to the secondary system is conducted and database processing inthe secondary database processing system is started, then the DB accesscontrol section 311 in the secondary host computer 3 orders thesecondary disk subsystem 4 to execute the DBMS start processing.

At step 1201, the command processing section 411 in the secondary disksubsystem 4 reads out a status file on the secondary status disk 45, andacquires information indicating the state of the database. It is nowsupposed that information indicating that the DBMS is in operation isstored in the status file as information indicating the database stateat the time of database processing start and information indicating thatthe DBMS has been normally finished is stored in the status file at theend of the database processing.

At step 1202, it is determined whether the database processing of thelast time has been finished normally, by referring to the acquiredinformation indicating the database state. If the acquired databasestate indicates that the DBMS is in operation, i.e., informationindicating that the DBMS has been finished normally is not recorded inthe status file, then the database processing of the last time isregarded as have not been finished normally and the processing proceedsto step 1203.

At the step 1203, status information indicating a location of a logrecord at the time of immediately preceding checkpoint is referred to,and an input location of the log record is acquired.

At step 1204, the secondary log disk 46 is referred to in order to readout the log record from the acquired input location, and rollforwardprocessing is conducted on the database area in the secondary DB disk44.

At step 1205, rollback processing for canceling processing ofuncompleted transactions among transactions subjected to the rollforwardprocessing using the log record is conducted.

At step 1206, information indicating that the DBMS is in operation andstatus information indicating the location of the log record afterrecovery are stored in the status file in the secondary status disk 45.

In general, in the conventional DBMS, data modified in a transaction isnot written to the storage in synchronism with the committing of thepertinent transaction in order to ensure the execution performance ofthe transaction, a trigger called checkpoint having a predeterminednumber of times of transaction occurrence or a predetermined time as atrigger is provided. Upon the trigger, DB data modified during that timeis written to the storage. And DB contents modified after the checkpointare written to the log disk. In restart processing at the time of serverdown, DB modification after the checkpoint is restored and recoveredfrom modification history in the log disk.

At the time of restart after the server shut down, from which locationin which log disk log information should be forced after the latestcheckpoint poses a problem. In general, such information is stored in aheader portion or the like on the log disk. A log disk and a readlocation that becomes a subject of the force at the time of restart aredetermined on the basis of the information.

In the case where a log disk is subjected to synchronous copy and a DBdisk is subjected to asynchronous copy in such a conventional DBMS,there is a possibility that modification contents of DB subjected tocheckpoint on the log disk in the main site have not been transferred.Modification contents of DB forced to the storage in the main site atthe time of checkpoint are lost in the remote site, and mismatching iscaused in the recovery.

On the other hand, in the disaster recovery system of the presentembodiment, a status file for managing a log disk input point at thetime of checkpoint is provided so as to prevent mismatching from beingcaused in recovery in the secondary disk subsystem 4 even if a log blockis subjected to synchronous remote copy processing and a DB block issubjected to asynchronous remote copy processing. In addition, thestatus file is transferred in asynchronous remote copy processing, andthe modification order between the status file and the DB blocktransferred asynchronously in the same way is guaranteed by thesecondary disk subsystem 4.

As a result, it is possible to refer to the status file on the secondarystatus disk 45 at the time of database processing start after theswitching from the primary system to the secondary system, and conductrecovery from a location indicated by the status information.

In the disaster recovery system of the present embodiment, a writerequest at the time of checkpoint is also transmitted to the secondarydisk subsystem 4 asynchronously as described above. In the case where awrite request at the time of checkpoint has been issued, however, thatwrite request and write requests temporarily stored until that timepoint for asynchronous writing may also be transmitted to the secondarydisk subsystem 4.

FIG. 11 is a diagram showing an outline of processing conducted at thetime of checkpoint in the present embodiment. If a primary DB volumecheckpoint request for requesting a checkpoint of the primary DB disk 24is transmitted from the primary host computer 1 as shown in FIG. 11,then the primary disk subsystem 2 transmits remote copy data temporarilystored in the queue in the memory or magnetic disk in the primary disksubsystem 2 at that time to the secondary disk subsystem 4, andtransmits the DB block 242 a and status information received togetherwith the primary DB volume checkpoint request to the secondary disksubsystem 4.

The secondary disk subsystem 4 writes all of the DB block 242 a andstatus information transmitted together with the write request into thecache 42, and then generates a remote copy completion notice, whichindicates that the writing has been completed, and transmits the remotecopy completion notice to the primary disk subsystem 2.

Upon receiving the remote copy completion notice from the secondary disksubsystem 4, the primary disk subsystem 2 generates a primary DB volumecheckpoint completion notice indicating that the checkpoint processingrequested by the primary host computer 1 has been completed, andtransmits the primary DB volume checkpoint completion notice to theprimary host computer 1.

In the case where synchronization processing of the primary disksubsystem 2 and the secondary disk subsystem 4 is conducted at the timeof the log block write request and the checkpoint request in thedisaster recovery system of the present embodiment, the contents ofmodification in transactions completed in the primary system areprevented from being lost in the secondary system, and writing the DBblock and status information is conducted collectively at the time ofcheckpoint. As compared with the case where all of the DB block andstatus information are transferred by using synchronous remote copy,therefore, performance degradation in the primary system can beprevented. Even in a configuration using a database management systemthat does not have a dedicated status file, DB modification data forcedto the storage in the primary system at the time of checkpoint is notlost in the secondary system.

According to the disaster recovery system of the present embodiment, loginformation is modified by synchronous writing and database data andstatus information are modified by asynchronous writing, when writing tothe secondary system is requested, as heretofore described. Therefore,the contents of modification in transactions completed in the primarysystem are prevented from being lost in the secondary system. It ispossible to construct a disaster recovery system reduced in performancedegradation in the primary system.

According to the present invention, it becomes possible to reduce thepossibility that modification contents of transactions completed inexecution are lost in the transaction processing.

It should be further understood by those skilled in the art thatalthough the foregoing description has been made on embodiments of theinvention, the invention is not limited thereto and various changes andmodifications may be made without departing from the spirit of theinvention and the scope of the appended claims.

1. A computer system, which is a primary system to a secondary computersystem comprising a secondary host computer and a secondary storagesubsystem, the computer system comprising: a primary host computerexecuting a primary database management system program corresponding toa secondary database management system program to be executed by thesecondary host computer; and a primary storage subsystem coupled to theprimary host computer and the secondary storage subsystem, wherein theprimary storage subsystem, in response to a write request from theprimary host computer, stores primary log information, primary databasedata, and primary status information, which are modified by the primaryhost computer based on the execution of the primary database managementsystem program, the primary status information indicating a location oflog information to be used at a time of switching transaction processingfrom the primary host computer to the secondary host computer, wherein,to the secondary storage subsystem, the primary storage subsystemexecutes a synchronous remote copy of the primary log information and anasynchronous remote copy of the primary database data and the primarystatus information, the secondary database management system programcausing the secondary host computer to: (A) read a copy of the primarystatus information stored in the secondary storage subsystem, created bythe asynchronous remote copy; (B) based on the copy of the primarystatus information, decide locations on a copy of the primary loginformation created by the synchronous remote copy, to be used to modifya copy of the secondary database data created by the asynchronous remotecopy; (C) read a part of the copy of the primary log informationindicated by the locations on the copy of the primary log information;and (D) modify the copy of the primary database data according to thepart of the secondary log information so that modification of acompleted transaction processed by the primary host computer is storedin the copy of the primary database data.
 2. A computer system accordingto claim 1, wherein, as to the modification by the primary hostcomputer, the primary database management system program causes theprimary host computer to: (i) modify the primary database data in theprimary storage subsystem based on modification data temporarilybuffered in the primary host computer; (ii) store a checkpointacquisition log to the primary log information in the primary storagesubsystem; and (iii) modify the primary status information in theprimary storage subsystem after the processing of (i) and (ii).
 3. Acomputer system according to claim 2, wherein the primary storagesubsystem includes a first primary disk, a second primary disk, and athird primary disk, wherein the primary log information is stored in thefirst primary disk, wherein the primary database data is stored in thesecond primary disk, and wherein the primary status information isstored in the third primary disk.
 4. A computer system according toclaim 3, wherein the step (D) comprises a roll-forward processing and aroll-back processing.
 5. A computer system according to claim 4, whereinthe secondary database management system program causes the secondaryhost computer to: (E) change an operation state from another state aftercompletion of the roll-forward processing and the roll-back processing.6. A disaster recovery method for a computer system being a primarysystem to a secondary computer system including a secondary hostcomputer and a secondary storage subsystem, the computer systemincluding a primary host computer and a primary storage subsystem, themethod comprising: by the primary host computer, executing a primarydatabase management system program corresponding to a secondary databasemanagement system program to be executed by the secondary host computer;by the primary storage subsystem, in response to a write request from aprimary host computer, storing primary log information, primary databasedata, and primary status information, which are modified by the primaryhost computer based on the execution of the primary database managementsystem program, the primary status information indicating a location oflog information to be used at a time of switching transaction processingfrom the primary host computer to the secondary host computer; and bythe primary storage subsystem, executing a synchronous remote copy aboutthe primary log information and an asynchronous remote copy about theprimary database data and the primary status information, the secondarydatabase management system program causing the secondary host computerto: (A) read a copy of the primary status information stored in thesecondary storage subsystem, created by the asynchronous remote copy;(B) based on the copy of the primary status information, decidelocations on a copy of the primary log information created by thesynchronous remote copy, to be used to modify a copy of the secondarydatabase data created by the asynchronous remote copy; (C) read a partof the copy of the primary log information indicated by the locations onthe copy of the primary log information; and (D) modify the copy of theprimary database data according to the part of the secondary loginformation so that modification of a completed transaction processed bythe primary host computer is stored in the copy of the primary databasedata.
 7. A disaster recovery method according to claim 6, wherein, as tothe modification by the primary host computer, the primary databasemanagement system program causes the primary host computer to: (i)modify the primary database data in the primary storage subsystem basedon modification data temporarily buffered in the primary host computer;(ii) store a checkpoint acquisition log to the primary log informationin the primary storage subsystem; and (iii) modify the primary statusinformation in the primary storage subsystem after the processing of (i)and (ii).
 8. A disaster recovery method according to claim 7, whereinthe step (D) comprises a roll-forward processing and a roll-backprocessing.
 9. A disaster recovery method according to claim 8, whereinthe secondary database management system program causes the secondaryhost computer to: (E) change an operation state from another state aftercompletion of the roll-forward processing and the roll-back processing.